8/27/2020

Agenda for today

  • Introduce ourselves
  • Introduce the course
  • Go through the syllabus
  • Talk about some introductory concepts

Zoom etiquette

  • Please mute your sound while in the main meeting

  • In breakout rooms, unmute and turn your video on if possible

  • If you have a question or comment, type in the chat

  • You can also raise your hand virtually in the participants tab

Please be aware that the class is being recorded

About me

  • From Udine, Italy
  • BSc in Engineering between Milan, Italy and Paris, France
  • MSc in Statistics in Milan, Italy
  • PhD Candidate in SDS
  • My teaching experience: …
  • My research: clustering methods for Bayesian statistics

About you

  • Name
  • Major
  • Hometown
  • Interest in SDS 323
  • …

This course

  • A “new” course offered by SDS since 2018
  • Intersection of statistical inference and machine learning
  • Aim to teach you how to use methods/tools, not how to invent or build one
  • Understand meaning, strengths and weaknesses of various models, not the underlying theories
  • Hands-on experience via the statistical software \(\texttt{R}\)

This course

Materials will be posted on Canvas:

  • Syllabus
  • Lecture Slides
  • Homework
  • R code

If you are on WL, talk to me at the office hours after class!

Homework groups


  • Homework assignments can be done in groups with up to 4 people
  • If you already know some of your peers and want to create a group, e-mail me before next Tuesday
  • If you do not know anyone, do not worry! I will manually create groups of 4 people and will do my best to make them heterogeneous so that you will have different strengths

This course

What I will need from you

  • Motivation to participate in class
  • Excitement about homework
  • Time management
  • Interest in your own set of problems

Temperature check

What .gif best describes your feelings towards this semester?

1

3

2

4

Syllabus

Introduction to Statistical Learning

What is Statistics?

“All models are wrong, but some are useful.” - George E. P. Box

A set of scientific tools to collect data, and turn data into information, knowledge, evidence and insights. It should result in better understandings, judgments and decisions about a phenomenon

What is that we “learn”, in Statistics?

  • Patterns
  • Correlations
  • Make predictions
  • …

Most of this course

Supervised learning

Predicting, or estimating, an output \(y\) based on one or more inputs \(x\).

  • Given past data on outcomes \(y\) paired with features \(x\), can we find patterns that allow us to predict \(y\) using \(x\)?
  • Key characteristic: there is a single privileged outcome \(y\)
  • Example: a house has \(3\) bedrooms (\(x_{1}\)), \(2\) bathrooms (\(x_{2}\)), \(2100\) square feet \((x_{3})\), and is located in Hyde Park (\(x_{4}\)). What price (\(y\)) should it sell for?

Towards the end

Unsupervised learning

  • We still have multivariate data (many input variables \(x\)) and want to find patterns.
  • But there is no single privileged outcome
  • Example: "Here’s data on the shopping basket of every Whole Foods customer at \(6^{th}\) and Lamar last month. Find some patterns that we can use to improve product placement’’

Problem examples

  • Predict whether someone will have a heart attack on the basis of diet and other clinical measurements
  • Discover breast cancer subtypes
  • Customize an email spam detection system
  • Recognize images representing handwritten zip codes
  • Establish the relationship between salary and demographic variables in population survey data

Problem examples

  • Predict whether someone will have a heart attack on the basis of diet and other clinical measurements
  • Discover breast cancer subtypes
  • Customize an email spam detection system
  • Recognize images representing handwritten zip codes
  • Establish the relationship between salary and demographic variables in population survey data

What’s next?

The first two weeks might feel slow paced if you have solid prerequisites. I want to give you time to properly learn the foundations that we will need in the rest of the course.

  • Tuesday 09/01: introduction to probability
  • Thursday 09/03: first hands on experience with R
  • Tuesday 09/08: introductory topics


The first homework will cover the prerequisites and make sure that we can start building on them. Start working on it early!